Technology of Text Mining
Identifieur interne : 001A14 ( Main/Exploration ); précédent : 001A13; suivant : 001A15Technology of Text Mining
Auteurs : Ari Visa [Finlande]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2001.
Abstract
Abstract: A large amount of information is stored in databases, in intranets or in Internet. This information is organised in documents or in text documents. The difference depends on the fact if pictures, tables, figures, and formulas are included or not. The common problem is to find the desired piece of information, a trend, or an undiscovered pattern from these sources. The problem is not a new one. Traditionally the problem has been considered under the title of information seeking, this means the science how to find a book in the library. Traditionally the problem has been solved either by classifying and accessing documents by Dewey Decimal Classification system or by giving a number of characteristic keywords. The problem is that nowadays there are lots of unclassified documents in company databases and in intranet or in Internet. First one defines some terms. Text filtering means an information seeking process in which documents are selected from a dynamic text stream. Text mining is a process of analysing text to extract information from it for particular purposes. Text categorisation means the process of clustering similar documents from a large document set. All these terms have a certain degree of overlapping. Text mining, also know as document information mining, text data mining, or knowledge discovery in textual databases is an merging technology for analysing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. Typical subproblems that have been solved are language identification, feature selection/extraction, clustering, natural language processing, summarisation, categorisation, search, indexing, and visualisation. These subproblems are discussed in detail and the most common approaches are given. Finally some examples of current uses of text mining are given and some potential application areas are mentioned.
Url:
DOI: 10.1007/3-540-44596-X_1
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000317
- to stream Istex, to step Curation: 000312
- to stream Istex, to step Checkpoint: 001068
- to stream Main, to step Merge: 001B07
- to stream Main, to step Curation: 001A14
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Technology of Text Mining</title>
<author><name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1007/3-540-44596-X_1</idno>
<idno type="url">https://api.istex.fr/document/A9D55CDEED0425A739C61C52479F43C882308A8B/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000317</idno>
<idno type="wicri:Area/Istex/Curation">000312</idno>
<idno type="wicri:Area/Istex/Checkpoint">001068</idno>
<idno type="wicri:doubleKey">0302-9743:2001:Visa A:technology:of:text</idno>
<idno type="wicri:Area/Main/Merge">001B07</idno>
<idno type="wicri:Area/Main/Curation">001A14</idno>
<idno type="wicri:Area/Main/Exploration">001A14</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Technology of Text Mining</title>
<author><name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
<affiliation wicri:level="1"><country xml:lang="fr">Finlande</country>
<wicri:regionArea>Tampere University of Technology, FIN-33101, P.O. Box 553, Tampere</wicri:regionArea>
<wicri:noRegion>Tampere</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Finlande</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2001</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">A9D55CDEED0425A739C61C52479F43C882308A8B</idno>
<idno type="DOI">10.1007/3-540-44596-X_1</idno>
<idno type="ChapterID">1</idno>
<idno type="ChapterID">Chap1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: A large amount of information is stored in databases, in intranets or in Internet. This information is organised in documents or in text documents. The difference depends on the fact if pictures, tables, figures, and formulas are included or not. The common problem is to find the desired piece of information, a trend, or an undiscovered pattern from these sources. The problem is not a new one. Traditionally the problem has been considered under the title of information seeking, this means the science how to find a book in the library. Traditionally the problem has been solved either by classifying and accessing documents by Dewey Decimal Classification system or by giving a number of characteristic keywords. The problem is that nowadays there are lots of unclassified documents in company databases and in intranet or in Internet. First one defines some terms. Text filtering means an information seeking process in which documents are selected from a dynamic text stream. Text mining is a process of analysing text to extract information from it for particular purposes. Text categorisation means the process of clustering similar documents from a large document set. All these terms have a certain degree of overlapping. Text mining, also know as document information mining, text data mining, or knowledge discovery in textual databases is an merging technology for analysing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. Typical subproblems that have been solved are language identification, feature selection/extraction, clustering, natural language processing, summarisation, categorisation, search, indexing, and visualisation. These subproblems are discussed in detail and the most common approaches are given. Finally some examples of current uses of text mining are given and some potential application areas are mentioned.</div>
</front>
</TEI>
<affiliations><list><country><li>Finlande</li>
</country>
</list>
<tree><country name="Finlande"><noRegion><name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</noRegion>
<name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A14 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001A14 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B |texte= Technology of Text Mining }}
This area was generated with Dilib version V0.6.32. |